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Corpus-supported academic writing: 
how can technology help? 
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Abstract. Phraseology has long been used in L2 teaehing of aeademie writing, 
and eorpus linguisties has played a major role in the eompilation and assessment 
of aeademie phrases. However, there are only a few interaetive aeademie writing 
tools in whieh eorpus methodology is implemented in a real-time design to support 
formulation proeesses. In this paper, we deseribe several eorpus-related methods 
that we have developed and implemented as part of an interaetive thesis-writing 
tool. Thesis Writer, designed and eonstrueted jointly by the Language Competenee 
Centre and the Center for Innovative Teaehing and Learning of the Zurieh University 
of Applied Seienees in Switzerland. Thesis Writer (TW) hosts several linguistie- 
support tools and is designed in its first pilot version to support thesis writing in 
eeonomies with the help of two self-eompiled eorpora in English and German. 
Students ean aeeess the eorpora direetly via the IMS Open Corpus Workbeneh 
or via a pre-seleeted eolleetion of eentral rhetorieal elements through the phrase 
book. Several seareh options and tutorials have been tested and ineluded into the 
TW platform: the eorpus simple seareh tool, the eorpus syntaetie seareh tool, and 
the aeademie phrasebook. In the ease of the latter, a new methodology led to the 
identifieation of lists of phrases distributed in researeh-eyele seetions of the thesis. 

Keywords: aeademie writing, eorpus linguisties, language instruetion, thesis 
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1. Introduction 

Since the 1970s, when the teaching of writing began to shift from a text-oriented 
to a process-centered approach, writing instruction has largely abstained from 
a direct teaching of language. From this time on, the interest of teachers and 
researchers has focused on what the writers do and think rather than on the 
linguistic or textual means they use. Recently, several experts have demanded a 
reconsideration of the role of language in writing (e.g. Feilke 2010, 2012, 2014; 
Hyland, 2000; Myhill & Fischer, 2010; Steinhoff, 2007) and started to develop 
theoretical and educational models. However, it is yet to be explored what a 
linguistically informed writing process might look like and how the formulation 
process can be supported by knowledge about language. As Feilke (2010, 2012) 
suggested, a plausible assumption is that writing relies on a high number of 
routinized textual procedures, which serve rhetorical and structural functions in 
the construction of meaning. 

Corpus linguistics provides several effective approaches at the interface of 
research in learner language and academic writing, which can be used to 
identify such routines as phrases, chunks, and collocations. The results of corpus 
linguistics in the works of Swales (1990, 2004), Hyland (2000), Granger, Hung 
and Petch-Tyson (2002), Steinhoff (2007), Biber and Conrad (2009), Liideling 
and Walter (2009), Romer and Wulff (2010), Nesi and Gardener (2012), and many 
others have provided us with insights into the linguistic patterns and resources 
used by certain communities to solve domain-specific rhetorical problems. By 
using corpus linguistics, language teaching enters a new technological territory 
with multiple facets that can be applied and tested: (a) strategies of the CALL 
framework (cf. Beatty, 2003) or (b) Data-Driven Learning (DDL) (cf. Johns, 
1986). 

However, technology has scarcely been exploited for interactive tools that support 
academic writing linguistically (e.g. see Hsieh & Liou, 2009, for a presentation of 
the POWER and CARE tools). In this study, we will describe several methods of 
analysis that can be applied to the corpus linguistics results so that they can be used 
to facilitate academic writing tasks for students writing in English or German (as 
LI and/or L2). The methods have been implemented in the interactive academic- 
writing tool. Thesis Writer, designed and constructed jointly by the Department 
of Applied Linguistics and the Center for Innovative Teaching and Learning of 
the Zurich University of Applied Sciences in Switzerland. The tool is designed to 
help students who use either English or German (both as LI and L2) to write their 
bachelor or master theses in economics. 


126 


Corpus-supported academic writing: how can technology help? 


2. Method 


2.1. Brief description of the academic writing tool 

Thesis Writer is primarily a learning platform, but it can also be used as a research 
tool to collect and analyze data about academic writing. Thesis Writer (Figure 1) 
supports students by (1) structuring the writing process; (2) providing short 
tutorials for all major steps and actions; (3) offering a “proposal wizard” to guide 
students through the critical issues of the thesis proposal structure; (4) supporting 
the transfer of the proposal into the final version of the thesis; and (5) offering help 
with organizing and revising the thesis. 


Figure 1. Road map of Thesis Writer 
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Thesis writing: The whole process - and how 
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1. Understand the structure of a 4. Review and write a literature 
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2. State your research question 5. Follow the proposal structure 
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2.2. Technical details 

The technical platform for Thesis Writer is driven by a LAMP server (Linux, 
Apache, MySQL, PHP) developed with the PHP-based framework yiil following 
strict design patterns for object-oriented programming and the principles of model- 
view-controller (for more details, see Rapp, Kruse, Erlemann, & Ott, 2015). 
What happens from a technical perspective when a user seeks language-sensitive 
linguistic support in Thesis Writer by highlighting a word or a passage and clicking 
the linguistic-support button? The corpus is stored in a database. IMS Open Corpus 
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Workbench (CWB"^) enables various queries and actions on the corpus via a number 
of command line prompts resulting in outputs. To allow the user to perform queries 
via highlighting and selecting text and using the linguistic support, a Perl script 
collection and a number of PHP classes mediate between the GUI of Thesis Writer 
and the command line tools of CWB. To improve the quality of suggestions made 
to the user, we utilize TreeTagger^ to parse the entire user’s text. 


2.3. Corpus simple search tool 

One of the simplest platform-intermediated corpus methods refers to word-in- 
context free searches. The IT specialists in the team have helped us design and 
integrate a user-friendly button, i.e. “linguistic support”, so that the linguistic 
searches are performed directly on the platform by a registered user of Thesis 
Writer, with CWB processing data in the back-end (Figure 2). 

Figure 2. Linguistic support tool 
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Seit Beginn der Schuldenkiise 1982 und spatestens mit Ausbruch derl^^des Euiopaischen Wahrungssystems 1992/93 ist das 
Phanomen dei Wahrungskrisen, wel- che sich in Form von teilweise extremen Wechselkursveischiebungen zeigen, durch die Medien 
einer breiten Bevdikeiung zuganglich gemacht worden. Die Folgen der Krisen - zumindest der schweren - waren fur die jeweils 
betroffenen Lander meist fatal: drastische Wahrungsabwertungen der betroffenen Wahrungen, ein Anstieg der inlandischen Zinsen 
in schwindelerregende Hohen in Verbindung mit stark steigen- den Inflationsraten. Diese Entwicklungen haben einen heftigen 
Riickgang der Pro- duktionstatigkeit und des wirtschaftlichen Wohlstandes in den betroffenen Nationen ausgelost (vgl. Sell 1 999, S. 
2 ). 


2.4. Corpus syntactic search tool 

Still in the testing stage, this tool is intended to offer students the option to look 
for recurrent syntactic patterns, if research demonstrates that such patterns affect 
the quality of student writing. We looked at [Adj. + Subst.] patterns and found 
that the syntactic string is quite prolific. One of the challenges at this stage is 

4. More information: http://www.ims.uni-stuttgart.de/forschung/projekte/CorpusWorkbench.html 

5. More information: Schmid (1994). 
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also the correct retrieval of the desired POS patterns and the elimination of errors 
from the list (see the case “schwer at the end of the search list in Figure 3). A 
computational linguist is currently working on solving this matter. The technical 
solution for the integration in Thesis Writer will be implemented by the end of 
2015. 

Figure 3. Syntactic search in CWB 
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2.5. Construction of an academic phrase book 

A more complex linguistic- support method is the list-of-phrases generator that 
provides useful academic phraseology when users are working on certain sections 
of their papers. The phrase book is comparable to the Academic Phrasebank of 
the University of Manchester^, but it significantly differs from it because the lists 
of academic phrases compiled for Thesis Writer are organised according to the 
section of the thesis they are generally typical for. The methodology used for the 
compilation of the academic phrase book implies several analysis steps: 

• Academic phrases in theory. Given the fact that the self-compiled corpora 
are not yet content annotated (e.g. annotation of academic phrases), in 
order to be able to start the identification of the most frequent academic 
chunks, a pre-selection stage was performed. This involved the collection 
of information on academic writing phraseology from textbooks^ or online 
informative materials^. 

• Academic phrases within the research cycle: Afterwards, we conducted 
another intermediary processing stage in which the lists of phrases extracted 
from literature were re-arranged in order to match the sections in Thesis 
Writer: (1) Topic/Research Question, (2) Relevance, (3) Research Gap/ 
Knowledge Gap, (4) State of the Art, (5) Method/Procedure, (6) Discussion, 


6. More information: http://www.phrasebank.manchester.ac.uk/ 

7. For instance, Bigler and Bugmann (2007). 

8. For instance, for academic writing in German, one source of information was bab.la (Schroeter & Uecker, n.d.). 
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(7) Results, and (8) Conclusions. Each of the sections included sub-categories 
of phrases as well (see Table 1 below). 

• Keywords in academic phrase lists: By analyzing the list of phrases resulting 
from the re-arrangement within the research-cycle categories, we were able 
to identify several academic keywords. 

• Academic phrase book construction: Using corpus linguistics methodology, 
each identified keyword was analysed with the help of a concordance^ 
software, which can make instant searches in the self-compiled English and 
German corpora. Two main strategies led to the identification of the most 
relevant academic phrases: (a) the software retrieved the clusters in which 
the indicated “keyword” was included; (b) the analysis was conducted in 
such a way that the most frequent collocation patterns at the left and right 
position (+/- 5 words) could be filtered out. From the compiled lists, the most 
frequent and/or most typical academic-writing phrases were extracted and 
compiled into a discipline-specific academic phrase book. 


Table 1 . Academic phrases within the research cycle 


Main category Subcategory Keyword(s) 

in research cycle in research 

cycle 

Fragestellung/ 

Forschungsfrage 

Einleitung 

Arbeit/ 

Kapitel/Studie/ 

Abschnit: 


Frage 

Beginn 


Thema 

nennen 


Phases (e.g.) 


- um diese Frage zu beant- 
worten... 

- Antwort auf diese Frage 

- zur Beantwortung dieser 
Frage 

- die Frage, ob 


Translation EN 


Topic / Research Question 
Introduction 

Paper/Chapter/Study/Section: 

- in order to answer this ques- 
tion... 

- the answer to this question... 

- to answer this question... 

-the question whether... 


Question 
Beginning 
Name topic 


3. Discussion and conclusion 

Although the testing of Thesis Writer by users (i.e. students) is still in preparation, 
there are several hypotheses on which the functionality of the linguistic tools has 
been based: 


9. For simple queries, WordSmith tools (V. 6) (Seott, 2012) were used. 
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• Support during writer’s block: It is anticipated that the simple searches 
will be useful especially to L2 writers during writer’s block stages of thesis 
writing. We imagine that if students have a more or less definite idea of the 
argumentation line they want to follow at a certain phase of the thesis, they 
might sometimes have difficulties in identifying the right words/phrases. 
They then make use of the discipline-specific corpora in order to find out 
which possible constructions would fit their needs. We do not intend that the 
students will use this option as a copy-paste procedure, and we would like 
to prevent that by programming the searches to be retrieved only at a limited 
left-right number of words. 

• Rhetoric support: Students sometimes lack the rhetoric awareness of a 
specific academic genre. TW can help them identify the right argumentative 
or academic phrase at the time and place they need it. 

• Support for students’ writing linguistic diversity: Scholars often warn 
against the use of academic phrase lists since it might prevent creativity 
and encourage repetitions in student writing. However, we anticipate that 
the diversity of research-cycle-based academic phrases extracted from the 
corpus (supplemented with the free search in corpus, where students can 
take inspiration for creating their own repertoire of academic phrases) will 
be evaluated positively by users. 
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